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What is claimed is: 



in 



1. A method ih a data processing system for isolating 
failing hardware in the data processing system, the 
method comprising 



responsive up detecting a recovery attempt from an 
error for an operation involving a hardware component, 
10 storing an indication of the attempt; and 

responsive to Vthe error exceeding a threshold, 
placing the hardware component in an unavailable state. 

2. The method of claim 1 further comprising: 

15 clearing the unavailable state of the hardware 

component in response \to a hot-plug action replacing the 
hardware component. 

3. The method of claii^j 1, wherein the placing step 
20 comprises: 

making a call to a l^ardware interface layer to place 
the hardware component in&o a permanent reset state, 

4. The method of claim 1,\ wherein the indication is 
25 stored in an error log, 

5. The method of claim 1 fArther comprising: 
responsive to a selected\ number of recovery attempts 

occurring, recreating the error. 



6. The method of claim 1, wherein the error is an error 
caused by a PCI bus operation. 
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7. The method of claim 1, wherein the detecting and 
placing steps occuir in a firmware layer within the data 
processing system . 

5 

8. The method of fclaim 1, wherein the detecting step 
occurs in a device qlriver and placing steps occurs in a 
firmware . 

10 9. The method of claim 1, wherein the threshold is the 
error successively a selected number of times. 
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10. A method in a data processing system for handling 
errors, the method comprising: 

responsive to an occurrence of an error, determining 
whether the error is a Recoverable error; 

responsive to a determination that the error is a 
recoverable error, identifying slots on the bus 
indicating an error state; 

incrementing an errof counter for each identified 
slot; and 

responsive to the errbr counter exceeding a 
threshold, placing the slot\ into a permanently 
unavailable state . 

11. The method of claim 10 further comprising: 
responsive to the error Icounter failing to exceed 

the threshold, placing the slpt into an available state, 
wherein a device within the stLot resumes functioning. 
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12. A data processing system comprising: 
a bus system; 

a communications unit connected to the bus system; 

a memory connected to the bus system, wherein the 
5 memory includes\as set of instructions; and 

a processing unit connected to the bus system, 
wherein the processing unit executes the set of 
instructions to store an indication of a recovery attempt 
from an error in response to detecting the recovery 
10 attempt; and place\the hardware component in an 

unavailable state ip response to the error exceeding a 
threshold. 

13. A data processing system comprising: 
15 a bus system; 

a communications \unit connected to the bus system; 

a memory connected to the bus system, wherein the 
memory includes as set \of instructions; and 

a processing unit connected to the bus system, 
20 wherein the processing unit executes the set of 
instructions to determine whether the error is a 
recoverable error in response to an occurrence of an 
error; identify slots on fthe bus indicating an error 
state in response to a determination that the error is a 
25 recoverable error; increment an error counter for each 
identified slot; and place the slot into a permanently 
unavailable state in response to the error counter 
exceeding a threshold. 
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14. A data processing system for isolating failing 
hardware in the data processing system, the data 
processings system comprising: 

storing means, responsive to detecting a recovery 
attempt front an error, for storing an indication of the 
attempt; anc 

placing\ means, responsive to the error occurring in 
the more than a threshold for a hardware component, for 
placing the hardware component in an unavailable state. 



15. The data processing system of claim 14 further 
comprising : 

clearing m4ans for clearing the unavailable state of 
the hardware component in response to a hot-plug action . 
15 replacing the handware component. 

16. The data processing system of claim 14, wherein the 
placing means comprises: 

means for makipg a call to a hardware interface 
20 layer to place the Jpard ware component into a permanent 
reset state. 

17. The data processing system of claim 14, wherein the 
indication is stored In an error log. 



18. The data processing system of claim 14 further 
comprising: 

recreating mearls\ responsive to a selected number of 
recovery attempts occurring, for recreating the error. 

19. The data processing \ system of claim 14, wherein the 
error is an error caused \by a PCI bus operation. 
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20, The data processing system of claim 14, wherein the 
detecting means\ and the placing means are located in a 
firmware layer within the data processing system. 

21. The data processing system of claim 14, wherein the 
detecting means is located in a device driver and the 
placing means is located in a firmware. 

10 22. The data processing system of claim 14, wherein the 
threshold is the errpr successively a selected number of 
times . 

23. A data processing system for handling errors, the 
15 data processing system\ comprising ; 

determining meansA responsive to an occurrence of an 
error, for determining whether the error is a recoverable 
error; 

identifying means, responsive to a determination 
20 that the error is a recoverable error, for identifying 
slots on the bus indicating an error state; 

incrementing means fori incrementing an error counter 
for each identified slot; a\ 

placing means, responsiWe to the error counter 
25 exceeding a threshold, for placing the slot into a 
permanently unavailable state\ 
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24. The data processing system of claim 23, wherein the 
placing means is a first placing means and further 
comprising : 
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second placing means, responsive to the error 
counter failing tp exceed the threshold, for placing the 
slot into an available state, wherein a device within the 
slot resumes functioning. 

25. A computer program product in a computer readable 
medium for isolating\ failing hardware in the data 
processing system, tne computer program product 
comprising: \ 

first instructions, responsive to detecting a 
recovery attempt from an error, for storing an indication 
of the attempt; and 1 

second instructions, responsive to the error 
occurring in the more than a threshold for a hardware 
component, for placing tne hardware component in an 
unavailable state. \ 

26. The computer program Vproduct of claim 25 further 
comprising: \ 

third instructions fo:ff clearing the unavailable 
state of the hardware competent in response to a hot-plug 
action replacing the hardware component. 

27. The computer program product of claim 25, wherein 
the placing step comprises: \ 

third instructions for making a call to a hardware 
interface layer to place the Irard ware component into a 
permanent reset state. \ 

28. The computer program product of claim 25, wherein 
the indication is stored in an error log. 
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29. The computer program product of claim 25 further 
comprising : 

third instructions, responsive to a selected number 
of recovery attempt^ occurring, for recreating the error, 

30. The computet program product of claim 25, wherein 
the error is an ^rror caused by a PCI bus operation. 

31. The computerl program product of claim 25, wherein 
10 the detecting andlplacing steps occur in a firmware layer 

within the data processing system. 
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32. The computer program product of claim 25, wherein 
the detecting step pccurs in a device driver and placing 
steps occurs in a firmware. 
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33. The computer prpgram product of claim 25, wherein 
the threshold is the\ error successively a selected number 
of times. 

34. A computer program product in a computer readable 
medium for handling eqrors, the computer program product 
comprising: 

first instruction^, responsive to an occurrence of 
an error, for determining whether the error is a 
recoverable error ; 

second instruction^, responsive to a determination 
that the error is a recoverable error, for identifying 
slots on the bus indicating an error state; 

third instructions nor incrementing an error counter 
for each identified slot; \ and 

fourth instructions, Iresponsive to the error counter 
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exceeding a threshold, for placing the slot into a 
permanently unavailable state. 



35. The computer \program product of claim 34 further 
comprising: 

fifth instructions, responsive to the error counter 
failing to exceed tlie threshold, for placing the slot 
into an available st\ite, wherein a device within the slot 
resumes functioning , 



